Web-Based Newborn Screening System for Metabolic Diseases: Machine Learning Versus Clinicians
نویسندگان
چکیده
BACKGROUND A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. OBJECTIVE The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. METHODS The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases. RESULTS The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency. CONCLUSIONS This SOA Web service-based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically.
منابع مشابه
Supervised machine learning techniques for the classification of metabolic disorders in newborns
MOTIVATION During the Bavarian newborn screening programme all newborns have been tested for about 20 inherited metabolic disorders. Owing to the amount and complexity of the generated experimental data, machine learning techniques provide a promising approach to investigate novel patterns in high-dimensional metabolic data which form the source for constructing classification rules with high d...
متن کاملClassification on High Dimensional Metabolic Data: Phenylketonuria as an Example
Tandem mass spectrometry is a promising new screening technology which permits screening within one analytical run not only for phenylketonuria (PKU) but also for a wide range of other metabolic disorders in newborns. We investigated two symbolic supervised machine learning techniques logistic regression analysis (LRA) and decision trees (DT), where the knowledge is represented in an explicit w...
متن کاملAnomaly-based Web Attack Detection: The Application of Deep Neural Network Seq2Seq With Attention Mechanism
Today, the use of the Internet and Internet sites has been an integrated part of the people’s lives, and most activities and important data are in the Internet websites. Thus, attempts to intrude into these websites have grown exponentially. Intrusion detection systems (IDS) of web attacks are an approach to protect users. But, these systems are suffering from such drawbacks as low accuracy in ...
متن کاملModelling of classification rules on metabolic patterns including machine learning and expert knowledge
Machine learning has a great potential to mine potential markers from high-dimensional metabolic data without any a priori knowledge. Exemplarily, we investigated metabolic patterns of three severe metabolic disorders, PAHD, MCADD, and 3-MCCD, on which we constructed classification models for disease screening and diagnosis using a decision tree paradigm and logistic regression analysis (LRA). ...
متن کاملA Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)
Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 15 شماره
صفحات -
تاریخ انتشار 2013